NASA  Contractor  Report  191552 
ICASE  INTERIM  REPORT  NO.  25 


AD-A274  649 

I  liM  ni  My  MH  MM  im  MM  ■>  M 

llllllllll 


ICASE 


EXPERIENCE  WITH  PARAMETRIC  BINARY  DISSECTION 


Shahid  H.  Bokhari 


NASA  Contract  No.  NAS  1-19480 
October  1993 


Institute  for  Computer  Applications  in  Science  and  Engineering 
NASA  Langley  Research  Center 
Hampton,  Virginia  23681-0001 

Operated  by  the  Universities  Space  Research  Association 


National  Aeronautics  and 
Space  Administration 

Langley  Research  Center 

Hampton,  Virginia  23681  -0001 


94-01110 

lllllllll 


94  1  10  103 


DTIC  QUAUry  INSPECTED  3 


Experience  with 
Parametric  Binary  Dissection* 

Shahid  H.  Bokhari 

Department  of  Electrical  Engineering 
University  of  Engineering  &  Technology 
Lahore,  Pakistan 

Abstract 

Parametric  Binary  Dissection  (PBD)  is  a  new  algorithm  that  can 
be  used  for  partitioning  graphs  embedded  in  2-  or  3-dimensional 
space.  It  partitions  explicitly  on  the  basis  of  nodes  +  Xx{edges  cut), 
where  A  is  the  the  ratio  of  time  to  communicate  over  an  edge  to  the 
time  to  compute  at  a  node.  The  new  algorithm  is  faster  than  the 
original  binary  dissection  algorithm  and  attempts  to  obtain  better 
partitions  than  the  older  algorithm,  which  only  takes  nodes  into  ac¬ 
count. 

We  compare  the  performance  of  parametric  dissection  with  plain 
binary  dissection  on  3  large  unstructured  3-d  meshes  obtained  from 
computational  fluid  dynamics  and  on  2  random  graphs.  We  show  that 
the  new  algorithm  can  usually  yield  partitions  that  are  substantially 
superior,  but  that  its  performance  is  heavily  dependent  on  the  input 
data. 
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1  Introduction 

In  order  to  fully  utilize  parallel  computers,  it  is  crucial  to  uniformly  partition 
the  domain  over  which  computations  are  to  be  performed.  This  problem  is 
known  to  be  computationally  intractable  and  a  number  of  heuristics  have 
been  developed  for  its  solution. 

Binary  dissection  or  orthogonal  recursive  partitioning,  developed  in  1985 
by  Berger  &  Bokhari  [2,  3],  is  a  partitioning  technique  that  is  in  widespread 
use[l,  5,  6].  This  is  a  fast  and  straightforward  algorithm  that  carries  out 
partitioning  as  a  series  of  recursive  bisections  that  minimize  the  load  at  each 
step.  This  algorithm  does  not  take  communication  costs  into  account  and  can 
sometimes  yield  partitions  that  have  poor  communicate  to  compute  ratio. 

The  solution  of  aerodynamic  problems  on  unstructured  meshes  is  an  im¬ 
portant  area  of  research  within  the  field  of  computational  fluid  dynaunics. 
Unstructured  meshes  are  graphs  embedded  in  2-  or  3-dimensional  space. 
The  current  requirement  is  to  solve  large  (ss  10®  node  &  10®  edge)  problems 
on  parallel  computers  such  as  the  Intel  iPSC-860  hypercube  or  PARAGON  2-d 
mesh.  Efficient  utilization  of  these  parallel  machines  requires  good  partition¬ 
ing  of  meshes  over  the  processors  of  the  system. 

When  binary  dissection  is  used  for  partitioning,  the  nodes  of  the  problem 
mesh  are  uniformly  distributed  over  all  processors  but,  of  course,  no  attention 
is  paid  to  the  number  of  edges  that  are  cut.  Each  edge  that  is  cut  by  the 
partitioning  results  in  an  inter-processor  communication  requirement.  We 
normalize  the  time  required  to  compute  at  a  node  to  1,  and  denote  by  A  the 
time  required  to  communicate  over  an  edge.  The  normalized  time  required 
by  a  specific  partitioning  of  a  problem  mesh  is  then  equal  to  the  maximum 
of  nodes  -|-A  X  {edges  cut)  over  all  subregions. 

Parametric  Binary  Dissection  (PBD)  [4]  is  a  new  technique  that  attempts 
to  take  communication  overhead  into  account  by  partitioning  on  the  basis  of 
load  as  well  as  communication  cost.  At  each  step  of  the  dissection,  an  attempt 
is  made  to  minimize  the  nodes  -|-A  x  {edges  cut)  for  the  two  subregions.  A 
fast  algorithm  for  PBD  is  given  in  [4].  Since  PBD  becomes  ordinary  binary 
dissection  when  A  =  0,  this  fast  algorithm  also  serves  to  solve  the  original 
problem  more  rapidly. 

In  this  paper  we  evaluate  the  performance  of  Parametric  Binary  Dissec¬ 
tion  on  3  unstructured  meshes  taken  from  aerodynamic  applications.  The 
meshes  that  we  use  for  our  evaluation  are  as  follows. 
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Mesh 

Nodes 

Eklges 

Provided  by 

Wing  &  Pod 

106064 

697992 

Dimitri  Mavriplis 

F-18 

316399 

2106889 

Clyde  Gumbert 

Wing  &  Store 

121200 

818066 

Neil  Frink 

We  also  provide  results  for  the  dissection  of  two  large  random  3-d  graphs. 

We  evaluate  PBD  by  applying  it  to  each  mesh  for  A  =  0, 2“®, . . . ,  1 , . . . ,  2*, 
for  depths  of  partitioning  varying  from  1  to  about  18.  Since  we  are  dealing 
with  binary  dissection,  each  level  of  partitioning  doubles  the  number  of  re¬ 
gions.  Thus  a  depth  4  partitioning  results  in  2*  regions  and  would  be  tar¬ 
geted  to  a  16  processor  system.  For  each  partition  we  obtain  the  maximum 
number  of  edges  cut  and  the  maocimum  number  of  nodes  over  all  regions. 
The  normalized  run  time  of  a  partition  is  then  max  nodes  4'Ax(moj  edges 
cut).  For  A  =  0  PBD  degenerates  into  plain  binary  dissection.  We  can  thus 
compare  the  performance  of  plain  and  paraunetric  dissection  by  dividing  the 
normalized  run  time  at  every  depth  for  A  =  0  with  the  run  time  at  various 
non-zero  vadues  of  A.  These  ratios  give  us  the  improvement  of  PBD  over 
plain  dissection  and  are  plotted  in  the  following  Sections. 


2  Wing  and  Pod 

This  mesh,  provided  by  Dimitri  Mavriplis,  is  baised  on  half  a  fuselage,  wing 
and  an  engine.  It  has  106064  nodes  and  697992  edges.  When  A  =  0,  Paramet¬ 
ric  dissection  is  the  same  aus  ordinaury  dissection  and  there  is  no  performance 
aulvantage.  For  depths  7-11  there  is  degraulation  in  performance,  except  for 
large  values  of  A.  For  depths  3-6  and  11-15  there  is  performance  improve¬ 
ment  and  this  increases  with  A.  The  time  taken  for  PBD  to  depth  16  on  this 
mesh  is  230  seconds  on  a  Sparcstation-10.  This  time  includes  67  seconds  to 
input  the  mesh. 
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3  F-18 


The  F-18  mesh  was  provided  by  Clyde  Gumbert  and  has  316399  nodes  and 
2106889  edges.  The  Parametric  Dissection  algorithm  could  not  provide  any 
performance  improvement  for  this  mesh.  Careful  analysis  of  the  mesh  re¬ 
vealed  that  the  wings  are  exactly  at  the  midpoint  (in  terms  of  nodes)  of  the 
domain.  Thus  the  first  two  cuts  tend  to  pass  through  the  wings  which,  of 
course,  contain  no  mesh  elements.  The  following  figure  is  a  simplified  repre¬ 
sentation  of  a  slice  through  the  mesh  (grey  area)  with  the  first  cut  passing 
through  the  wing.  The  nose  of  the  plane  points  towards  the  observer-only 
half  the  plane  is  shown. 


To  verify  this  conclusion,  we  redid  the  experiment  after  applying  a  rota¬ 
tion  to  the  node  coordinates.  The  rotation  that  we  used  was  an  arbitrarily 
chosen  79®  each  about  the  x,  y  zmd  z  axes,  in  that  order  The  performance 
on  the  rotated  mesh  is  better,  yielding  very  high  performance  improvements 

'  Any  set  of  rotations  large  enough  to  move  the  first  2  or  3  cuts  out  of  the  body  of  the 
plane  will  suffice. 
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improvement 


for  certain  depths,  but  is  not  as  dramatic  as  for  the  previous  case.  The  time 
required  for  a  depth  18  dissection  of  this  mesh  is  505  seconds  on  a  50-MHz 
MIPS  R4000  processor;  this  includes  82  seconds  for  input. 


F-18  n=316399  e=2106889 
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4  Wing  and  Store 

This  mesh  was  provided  by  Neil  Frink  and  has  121200  nodes  and  818068 
edges.  The  algorithm  was  able  to  show  good  performance  improvements  for 
several  values  of  depth  for  this  mesh.  The  time  required  is  301  seconds  for 
depth  15  (88  seconds  input  time)  on  a  Sparcstation-10. 

Wing  it  Store  n=121200  e=818O60 


4.1  Wing  and  Store:  modified  algorithm 

Careful  examination  of  the  plots  presented  so  far  will  show  that  there  is  no 
performance  improvement  for  depth  1.  This  is  because  the  PBD  implemen¬ 
tation  we  have  been  using  ignores  edges  for  the  first  cut.  This  is  important 
because,  as  explained  in  [4],  taking  edges  into  account  for  the  first  cut,  usu¬ 
ally  yields  very  poor  partitions.  However,  in  some  cstses  this  is  not  true.  For 
the  Wing  and  Store  mesh,  taking  edges  into  account  during  the  first  cut  can 
yield  very  large  performance  improvements  for  A  >  1. 


Wing  Si  Store  n=121200  e=818O60 


5  Random  Graphs 

Parametric  Binary  Dissection  was  also  run  on  two  randomly  generated  graphs. 
The  first  graph  has  100000  nodes  and  500287  edges.  We  obtain  performance 
improvements  at  depths  >  6  for  all  but  the  lowest  values  of  A. 


Random  Graph  n= 100000  e=500287 


depth 
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The  second  random  graph  ha^  100000  nodes  and  about  2.5  million  edges. 
Except  for  A  =  2“®,  there  is  performance  improvement  for  all  values  of 
lambda,  though  not  as  great  as  in  the  smaller  random  graph.  The  improve¬ 
ment  saturates  towards  a  smooth  curve  above  A  =  0.5. 


Random  Graph  n= 100000  e=2500458 


2  4  6  8  10  12  14 

depth 
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6  Conclusions 

In  this  paper  we  have  presented  a  brief  and  by  no  means  exhaustive  evalu¬ 
ation  of  Parametric  Binary  Dissection  (PBD)  on  unstructured  meshes.  Our 
experimental  results  indicate  that  the  performance  of  PBD  is  highly  problem 
dependent  but  that  it  can  often  provide  very  good  improvements  over  plain 
binary  dissection.  For  meshes  based  on  aurcraft,  the  position  of  the  wings  can 
be  a  troublesome  problem  but  can  be  overcome  to  some  extent  by  randomly 
rotating  the  mesh  points.  Very  good  performance  can  sometimes  be  obtained 
by  using  a  slightly  modified  variant  of  the  algorithm,  as  explained  in  Section 

4.1. 

The  PBD  algorithm  is  very  fast  and  it  is  feasible  for  the  practitioner  to 
try  out  several  partitions  to  choose  the  best  one  for  his  or  her  application. 
It  should  also  be  recognized  that  it  may  be  better  to  run  the  problem  on  a 
smaller  number  of  processors,  if  a  very  good  partition  has  been  obtained  for 
a  depth  lower  than  the  maximum  depth  possible.  For  example,  in  Section 

4.1,  given  a  32  processor  system,  and  A  =  2,  it  would  be  preferable  to  use 
only  16  of  these  processors.  This  is  because  the  depth  4  partition  is  7  times 
faster  than  the  depth  5  partition,  while  doubling  the  number  of  processors 
can  at  most  halve  the  time. 
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